Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

4 ◾ Bioinformatics

C+T, and C). The order of the nucleotides (A, C, G, and T) in the DNA sequence can then

be solved from the bands on the gel.

On the other hand, the steps of the Sanger sequencing method are similar to that of the

polymerase chain reaction (PCR) including denaturing, primer annealing, and comple-

mentary strand synthesis by polymerase. However, in the Sanger sequencing, the sample

DNA is divided into four reaction tubes labeled ddATP, ddGTP, ddCTP, and ddTTP. In the

four reaction tubes, the four types of deoxynucleotides triphosphates (dATP, dGTP, dCTP,

and dTTP) are added as in the PCR but one of the four radio-labeled dideoxynucleotide

triphosphates (ddATP, ddGTP, ddCTP, or ddTTP) is also added to the reactions, as labeled,

to terminate the DNA synthesis at certain positions of known nucleotides. The synthesis

termination results in DNA fragments of varying lengths ending with the labeled ddNTPs.

Those fragments are then separated by size using gel electrophoresis on a denaturing poly-

acrylamide-urea gel with each of the four reactions running in a separate lane labeled A,

T, G, and C. The DNA fragments will be separated by lengths; the smaller fragments will

move faster in the gel. The DNA bands are then graphed by autoradiography, and the order

of the nucleotide bases on the DNA sequence can be directly read from the X-ray film or

the gel image.

1.2.2 Next-Generation Sequencing

The next-generation sequencing (NGS) was invented a few decades after the invention of

the first-generation sequencing. Unlike the first-generation sequencing, NGS produces

massive number of sequences from a single sample in a short period of time, with lower

costs, and it can process multiple samples simultaneously. Millions to billions of DNA

nucleotides are sequenced in parallel, yielding substantially massive sequences. With the

NGS, millions of prokaryotic, eukaryotic, and viral genomes were sequenced. Rather

than chain termination, the NGS uses library or fragmented DNA to solve the order of

the nucleotide in a targeted sequence. The NGS is used in many applications including

the sequencing of the whole genome, whole transcriptome, targeted genes or transcripts,

sequencing of the genomic regions where the epigenetic modifications or protein interac-

tions take place. Hence, the NGS can be used for genome assembly, mutation or variant

discovery, gene expression studies, epigenetics, and metagenomics. Those applications are

discussed in detail in the next chapters.

After DNA or RNA sample collection, the step of the NGS process is the library prepa-

ration in which the sequencing libraries are constructed for the DNA or RNA sample of

interest. The RNA is converted into complementary DNA (cDNA) before library prepara-

tion. The library preparation involves breaking the DNA into small fragments (fragmen-

tation step) using sonication or enzymes. The size of the fragment can be adjusted to a

specific length or range. The fragmentation is followed by repairing or blunting the ends of

the fragments which have unpaired or overhanging nucleotides (end-repair step). The next

step is the ligation of the adaptors to the ends of the DNA fragments. The adaptors are arti-

ficially synthesized sequences that include certain parts to serve specific purposes. The free

end of a ligated adaptor is made of an anchoring sequence that can attach to the surface of

the flow cell slide where sequencing takes place. The adaptor also includes universal primers